Accessing a Large Multimodal Corpus Using an Automatic Content Linking Device

نویسندگان

  • Andrei Popescu-Belis
  • Jean Carletta
  • Jonathan Kilgour
  • Peter Poller
چکیده

As multimodal data becomes easier to record and store the question arises as to what practical use can be made of archived corpora, and in particular what tools allowing efficient access to it can be built. We use the AMI Meeting Corpus as a case study to build an automatic content linking device, i.e. a system for real-time data retrieval. The corpus provides not only the data repository, but is used also to simulate ongoing meetings for development and testing of the device. The main features of the corpus are briefly described, followed by an outline of data preparation steps prior to indexing, and of the methods for building queries from ongoing meeting discussions, retrieving elements from the corpus and accessing the results. A series of user studies based on prototypes of the content linking device have confirmed the relevance of the concept, and methods for task-based evaluation are under development.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation

Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...

متن کامل

BECAM tool - a semi-automatic tool for bootstrapping emotion corpus annotation and management

Corpus annotation is an important aspect in speech applications where stochastic models need to be trained and evaluated. Multimodal corpora are also annotated. Moreover, corpus annotation is an essential phase in the construction of emotion recognizer engines. Large corpora, as they are essential to construct representative knowledge bases, have been a problem for corpus annotators. Time consu...

متن کامل

Cicero - Towards a Multimodal Virtual Audience Platform for Public Speaking Training

Public speaking performances are not only characterized by the presentation of the content, but also by the presenters’ nonverbal behavior, such as gestures, tone of voice, vocal variety, and facial expressions. Within this work, we seek to identify automatic nonverbal behavior descriptors that correlate with expert-assessments of behaviors characteristic of good and bad public speaking perform...

متن کامل

A personal digital assistant as an advanced remote control for audio/video equipment

This paper describes a personal digital assistant that is used as a catalogue and advanced remote control to browse, select and play music in a compact disc jukebox. The application has been developed as a research prototype to identify advantages and disadvantages of different interaction styles for accessing large amounts of content. The basic concept provides easy access to a personal music ...

متن کامل

Proofs as discourse: an empirical study

Computer-based logic proofs are a form of ’unnatural’ language discourse, but the structure and process of proof can be observed in considerable detail, and analysis is leading to a number of general insights. We have been studying how students respond to multimodal logic teaching. First, psychological measures indicate that students’ pre-existing cognitive styles have a sigaificant impact on t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009